Entropy-SGD: biasing gradient descent into wide valleys
نویسندگان
چکیده
منابع مشابه
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape. Local extrema with low generalization error have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We leverage upon this observation to construct a local-entropy-based obj...
متن کاملSGD-QN: Careful Quasi-Newton Stochastic Gradient Descent
The SGD-QN algorithm is a stochastic gradient descent algorithm that makes careful use of secondorder information and splits the parameter update into independently scheduled components. Thanks to this design, SGD-QN iterates nearly as fast as a first-order stochastic gradient descent but requires less iterations to achieve the same accuracy. This algorithm won the “Wild Track” of the first PAS...
متن کامل"Oddball SGD": Novelty Driven Stochastic Gradient Descent for Training Deep Neural Networks
Stochastic Gradient Descent (SGD) is arguably the most popular of the machine learning methods applied to training deep neural networks (DNN) today. It has recently been demonstrated that SGD can be statistically biased so that certain elements of the training set are learned more rapidly than others. In this article, we place SGD into a feedback loop whereby the probability of selection is pro...
متن کاملLearning to learn by gradient descent by gradient descent
The move from hand-designed features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this paper we show how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. Our learned algorit...
متن کاملEmpirical Comparison of Gradient Descent andExponentiated Gradient Descent in
This report describes a series of results using the exponentiated gradient descent (EG) method recently proposed by Kivinen and Warmuth. Prior work is extended by comparing speed of learning on a nonstationary problem and on an extension to backpropagation networks. Most signi cantly, we present an extension of the EG method to temporal-di erence and reinforcement learning. This extension is co...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Statistical Mechanics: Theory and Experiment
سال: 2019
ISSN: 1742-5468
DOI: 10.1088/1742-5468/ab39d9